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1 Lattice QCD and NMR Spectroscopy 

The Lattice QCD (LQCD) community has occasionahy gone through peri- 
ods of self-examination of its data analysis methods and co mpared them with 
methods used in other disciplines jTou9d. iMicOl lMM95j . This process has 
shown that the techniques widely used elsewhere may also be useful in an- 
alyzing LQCD data. It seems that we are in such a period now with many 
groups trying what are generally calle d Bayesian methods such as Maximal 
Entropy (MEM) or constrained fitting |NAH99l Il+oI IABC02L iFieOi ID+O.l 



and many others]. In these proceedings we will attempt to apply this process 
to a comparison of data modeling techniques used in LQCD and NMR Spec- 
troscopy to see if there are methods which may also be useful when applied 
to LQCD data. 

A common problem in Lattice QCD is the estimation of hadronic energies 
Ek{p) k ^ 1 ■ ■ ■ K states from samples of the hadronic correlation function 
of a specified set of quantum numbers computed in a Monte Carlo simulation. 
A typical model function is 

K 

C(p,i„) = ^Afe(p)exp[-(io + na)£^fc(p)] (1) 

k=l 

Ak,EkeR, 0<Ei<---<Ek< Ek+i <---<Ek 

where one of the quantum numbers, the spatial momentum p, is shown ex- 
plicitly for illustration. The correlation function is estimated at each time t„, 
n = • • • A^ — 1 with the N chosen such that {E2 — Ei)tN-i '> 1. This enables 
the ground state energy Ei to be easily determined from the large time behav- 
ior. To accurately estimate the k-th energy level requires choosing a sampling 
interval a^^ 3> E^ — Ei. Unfortunately, computational constraints typically 
force us to choose time intervals larger (a~^ ~ 2 GeV) and number of time 
samples smaller {N ~ 32) than is ideally preferred. 
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In an idealized nuclear magnetic resonance (NMR) spectroscopy^ exper- 
iment, a sample is placed in an external magnetic field and a transient field 
from an RF coil is used to temporarily drive the various nuclei into a non- 
equilibrium distribution of magnetic spin states. Then, as the sample relaxes 
back to its equilibrium distribution, each type of excited nuclei radiates at 
a characteristic frequency fk- The sum of all these microscopic signals are 
picked up by another RF coil, giving rise to the free induction decay (FID) 
signal 

K 

Vn^Y. afce^*''e(-^'=+'2.A,)t„ + „^ e [0, TV - 1] (2) 
fc=i 

Ofc, 0fc, dfc, fk e K; rffe > 0; noise : e„ G C. 

As the frequencies are known a priori, an experienced operator can incor- 
porate this prior knowledge by Fourier transforming the data and matching 
Lorentzian peaks against existing databases. Bayesian methods are then used 
to constrain the frequencies, enabling the estimation of the other parameters. 
Of particular interest are the amplitudes afc, which are related to the number 
of various nuclei in the sample, and the damping rates dfc, which are related 
to the mobility and molecular environment of the nuclei. 
Both Eqs. (P) and can be written in the form 

K 

yn = Y akal (3) 

k=l 

or in matrix notation y — a. In numerical analysis, this is known as 

a Vandermonde system and $ is a Vandermonde matrix. Note also that all 
the parameters a. which enter non-linearly in the model only appear in the 
Vandermonde matrix and the remaining linear parameters in a. This suggests 
that if the best fit values of only the non-linear parameters, S, were known 
a priori then the remaining best fit values of the linear parameters, a could 
be determined using a linear least squares algorithm. Hence, linear and non- 
linear parameters need not be determined simultaneously and in Sec. [3 we 
will discuss the best known algorithm that exploits this feature. 

We have found that all of the model functions we use to fit hadronic cor- 
relations in LQCD can be written in the Vandermonde form. For a less trivial 
example, here is the model function for mesonic correlations with periodic 
(or anti-periodic) temporal boundary conditions and either Wilson (cr=l) or 
staggered (cr=-l) fermions 

K 

C{Tn) = ^^■"^fce""'^^'"/' cosh(an^fc), < Eu < Ek+2. (4) 
fc=i 

^In medical applications, MRS is a preferred abbreviation, probably to avoid the 
perceived public aversion to anything nuclear. 
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In this case, if we choose = a'' cosh(a£'fc) to be the parameters of the 
Vandermonde matrix $ then we can construct the data vector y from the 
correlation data 

yn = ^J:(''~'Wn-.,-^). (5) 

j=o \ -J y 

where (") are binomial coefficients. 

In NMR spectroscopy and in LQCD, fitting data often requires an experi- 
enced user to interact with the fitting program, i.e. to provide initial guesses to 
the minimizer or to choose what prior knowledge may be used to constrain the 
minimization, and this can often be a time-consuming process if the data are 
of marginal quality. In LQCD fitting programs, the effective mass technique 
is often used to provide non-interactive initial guesses to the minimizer. In 
NMR spectroscopy, more general analogues, called black box methods, have 
been developed for situations where an expert user is unavailable or the rate 
of data acquisition precludes interaction. In Sec. |3 we will look at the gener- 
alization of the effective mass technique, which will lead to a Hankel system 
that must be solved. 



2 VARPRO: Variable Projection algorithm 

In Sec. we considered data whose model may be written as y = $a, as 
in Eq. with the data vector y S and the linear parameter vector 
a G and N > 2K is necessary for the problem to be over-determined. 
The non-linear parameter vector a is used to determine the components of 
the non-linear parameter matrix $ S M.^^^ of the general form 

a) • • • <j)Kiti,a) \ 

; ••. ; • (6) 

(j)l{tN,Ot) ■ ■ ■ (t>K{tN,Ot) J 

Non-linear least squares problems of this type form a special class known as 
separable non-linear least squares and have been well studied in the numerical 
analysis community for the past thirty years. 

To see how this special structure can be exploited, recall the least squares 
functional to be minimized is 

r2(a,a) = |y-*(a)a^ (7) 

Now, suppose we were given a priori the value of the non-linear parameters 
a. at the minimum of Eq. ^ which we denote a. We can easily determine a 
posteriori the linear parameters a by solving the corresponding linear least 
squares problem. The solution is simply 
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a=*+(a)y (8) 

where ^^{a) is the Moore-Penrose pseudo-inverse of IWeiOJ . Substi- 

tuting Eq. © back into Eq. {T)) we get a new least squares functional that 
depends only on a 

r2(a) = |y-*(a)*+(a)y|'. (9) 

P(a) = $(Q:)$^(a) is the orthogonal projector onto the linear space spanned 
by the column vectors of $(a), so F^{(y.) = 1 — P(q:) is the projector onto the 
orthogonal complement of the column space of Hence, we can rewrite 

Eq. © more compactly as 

rl{a)^\p^{a)y\\ (10) 

This form makes it easier to see why rKa) is comm only c alled the variable 
projection (VARPRO) functional. It has been shown jGP7.'^ | that the minima 
of r2{(y-) and the corresponding values of a from Eq. ((SJ are equivalent to the 
minima of rf (a, a). 

One complication of the VARPRO method is computing the gradient 
dY2/da when the gradie nts d(t>k (tr,.,a)/da are known. The solution is pre- 
sented in some detail in |GP73I | and an excellent FORTRAN implementation 
Bol77| is available in the Netlib Repository. 



From our review of the NMR spectroscopy literature, it appears that the 
VARPRO method, and in particular the Netlib implementation, is competitive 
with the standard least squares method using either the LMDER routine of the 
MINPACK library or the NL2S0L routines of the PORT hbrary, both also available 
in the Netlib Repository. In general, the VARPRO functional requires fewer 
minimizer iterations, but the gradient comp utation is more expensive. Note 
that the Levenberg-Marquardt minimizer in PFTVQ^ performs quite poorly 



relative to these three and we cannot recommend its use in production code. 

Apart from the issue of numerical speed and accuracy of the VARPRO 
method, we see two additional benefits of this method over the standard 
method. First, by reducing the dimensionality of the search space by post- 
poning the determination of a, this also means that starting estimates for a 
are not needed. For LQCD, this is a great benefit, since good guesses for a 
are easily obtained from the black box methods of Sec.|31 Second, when the 
incorporation of Bayesian prior knowledge is desired, for LQCD it seems eas- 
ier to develop reasonable priors for the energies Ek than the amplitudes Ak- 
When using the VARPRO method, only priors for the energies are needed. Of 
course, if reliable priors for the amplitudes are available, one should instead 
use the standard method. Finally, data covariance can easily be incorporated 
in the usual way 

r|(a) = fp^(a)yl^C-i(y) fp^(a)yl . (U) 
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3 Black Box Methods 



3.1 Effective Masses 

The best example of a black box method widely used in LQCD is the method 
of effective masses. Let's consider the problem of Eq. ^ for the case N=2, 
K=l 

' Vn \ _ ( a? \ ,^ ^ ^ „ _ Vn+l „ _ Vn 
Vn+l 



(«l) 



ai 



'1 / Vn "1 

As expected, the problem is exactly determined, so there is an unique zero 
residual solution. For the model function of Eq. ^ the effective mass is mes = 
— log(ai). Note that the non-linear parameter ax is determined first from the 
data and then the linear parameter ai can be determined. This is an indication 
of the separability of the least squares problem discussed in Sec. |21 

As we are unaware of its presentation elsewhere, here is the two-state 
effective mass solution. We start from Eq. Q for N—A, K~2 



( Vn 


\ 
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1 \ 


Vn^ 


hi 




ai 




Vn- 


h2 
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\Vn~ 











aia'{ 
a2a2 



If we compute three quantities from the data 

A = y^+i - VnVn+2 

B = VnVn+S - yn+lVn+2 

C = vl+2 - Vn+lVn+d. 



(13) 



(14) 
(15) 
(16) 



then the two solutions for the non-linear parameters au come from the familiar 
quadratic equation 

"^■^ = 2A • 

As before, the linear parameters ai^2 can also be determined once the non- 
linear parameters are known 



Vn ± 



V(S2 - 4AC)[4A3 + (52 - AAC)yl] 



- AAC 



(18) 



where some care must be taken to properly match solutions. 

In general, when N=2K there should always be such a unique zero resid- 
ual solution. From inspection of Eq. H13|l the iV=4, K=2 problem is a set of 4 
coupled cubic equations. Unfortunately, due to Abel's Impossibility Theorem 
[Abc26], we should expect that general algebraic solutions are only possible 
for N<5. Yet, the rather surprising result of Ea. p7|l is that after properly sep- 
arating the non- linear parameters ak, the iV=4, K=2 problem is of quadratic 
order. Thus, we suspect that it is also possible to find algebraic solutions to 
the three-state and four-state effective mass problems when properly reduced 
to cubic and quartic equations after separation of variables. 
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3.2 Black Box I: Linear Prediction 



In order to compute solutions of Eq. Q when the system is over-determined 
{N > 2K) or when an algebraic solution is not available, we consider the first 
black box method called linear prediction. We form a K-th order polynomial 
with the ak as roots 



K 



K 



PiCt 



[po = 1). 



(19) 



fc=i 



i=0 



Since p{ak) = the following is true 



K 



i=l 



m > K. 



(20) 



When Eq. ^ is substituted in Eq. lO we find the following relation 



Vn 



K 



^Pk Vr, 



m > K. 



(21) 



Because Eq. (|21ll enables us to "predict" the data ym at larger times in terms 
of the data ym-K,''' ,2/m-i at earlier times, the pk are commonly called 
forward linear prediction coefficients. 

Using Eq. (|21|l we can construct the linear system hip — — HipP 



/ VM \ 

VM+-i 



I 



yo 
yi 



VM 



\ 


( PM \ 




Pm-1 


J 


[ PI ) 



N > 2M. 



(22) 



In numerical analysis, this is known as a Hankel system and the matrix Hip 
is a Hankel matrix. After solving Eq. H22() for p, the roots of the polynomial 
of Eq. H19|l are computed to determine the parameters ak . The ak parameters 
can subsequently be determined from Eq. (jHJ. 

In the presence of noisy data, the equality in Eq. (|22|l is only approximate, 
even for the case N = 2M , so some minimization method like least squares 
must be used. This doesn't mean that the parameter estimates from linear 
prediction agree with t he param eter estimates from the least squares methods 
of Sec. 121 Gauss proved Gau23j that the least squares estimates of fit parame- 
ters for linear problems have the smallest possible variance. In this sense, least 
squares estimates are considered optimal although we know of no proof that 
this holds for non-linear problems. Since linear prediction estimates may not 
agree with least squares, they are considered sub-optimal even though there 
is no proof that the variance is larger, due to non-linearity. 
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A popular method for solving Eq. (|22|l is the LPSVD algorithm |KT82f . 
In this method, we construct Hip for M as large as possible, even if we are 
only interested in estimating K < M parameters. After computing the SVD 
of Hip, we construct a rank K approximation Hipx 

Hip = = (UxUa) ( S2 ) (^^^2)^ , HipK = UkS^vJ^ (23) 

Sa' contains the K largest singular values. By zeroing S2 to reduce the rank 
of Hip, much of the statistical no ise is e l iminate d from the problem. From the 
Eckart-Young-Mirsky theorem |EY8fi[ 'MirGO] , this rank K approximation 
is the nearest possible under either the Frobcnius norm or matrix 2-norm. 
Then, after solving hip = — Hip^-p for the p,„ coefhcients, the M roots of the 
polynomial in Eq. (|19|l are computed using a root-finding algorithm. Since the 
rank of Hip was reduced to if, only K roots are valid parameter estimates. 
Typically, the K largest magnitude roots are chosen. 

Our experience with this algorithm is that the largest magnitude roots 
often have unphysical values if K is set larger than a reasonable number given 
the statistical precision of the data. There are also several issues which may 
be of some concern. First, we found that root-finding algorithms all come 
with caveats about stability and susceptibility to round-off error and must be 
treated with some care. Also, since statistical noise is present on both sides of 
Eq. (|22|l . the rank- reduced least squares solution is probably not appropriate 
and one should probably use an errors-in-variables (EIV) approach like total 
least squares (TLS), which we will describe in Sec. rTlH We have found that 
the TLS variant of LPSVD, called LPTLS |TM89I |. gives better parameter 
estimates than LPSVD. 



3.3 Total Least Squares 

In the standard linear least squares problem Ax w b 

minimize 1 1 Ax -blL, A e M"""-^', b e M^, TV > if (24) 

an important assumption for the solution, i.e. Eq. lO, to be considered op- 
timal is that the only errors are in the data vector b and further that those 
errors are i.i.d. (independent and identically distributed). When errors also oc- 
cur in A, as in Eq. (|22|l . then a new approach, often called errors-in-variables 
(EIV), is required to restore optimality. Note that the errors in A that cause 
the loss of optimality need not be purely statistical: numerical round-off er- 
rors or choosing to fit a model function which differs from the "true" model 
function are potential sources of error which could cause loss of optimality. 

To understand the total least squares (TLS) solution to the EIV problem, 
consider the case when a zero residual solution to Eq. I|24(l exists. Then, if we 
add b as a column of A, written [Ab], it cannot have greater column rank 
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than A because b e Ran (A). If we compute the SVD of [Ab] we will find 
that the singular value ax+i — 0. When the solution of Eq. (f^H) has non-zero 
residual, we may find the singular value aK + 1 of [Ab] to be non-zero as 
well. But, we can construct the nearest rank R < K approximation to [Ab] 
(in the sense of the Eckart-Young-Mirsky theor em) and this gi ves us the TLS 
solution. The TLS solution was computed in \CrT\7(l lOoiy.'^ . althou gh the 
name was not coined until fGVSQl] . A comprehensive review of the 

subject is available. 

Finally, TLS is very sensitive to the distribution of errors in [Ab] . If the 
errors are not known to be i.i.d. then it is crucial to scale the matrix before 
using the TLS algorithm . If the data are uncorrelated, then a method known 
as "equilibrium" scaling Bau63 l] is sufficient. If the data are correlated, then 
Cholesky factors of the covariance matrix must be used. In this case, it is 
better to use either the generahzed TLS algorithm (GTLS) |Van90al IVan90h| ] 
or the restricted TLS algorithm (RTLS) |VZ911] which are more robust when 
the covariance matrix is ill-conditioned. Imple mentatio ns of various TLS al- 
gorithms are available in the Netlib Repository |Van88j . 



3.4 Black Box II: State Space Methods 

The name for these methods is derived from state-space theory in the control 
and identification literature KAB83]. The basic approach is to compute the 
non-linear parameters ak of Eq. Q without needing to compute the roots of 
a polynomial, as in Sec. 13.21 From Eq. (|22(l . we start by noting that = 
[Hiphip] is also a Hankel matrix 



• • • VAI-l 



yN-2 



VM 



M> K, N- M > K 



A Vandermonde decomposition exists for this matrix 



(25) 



SAT^ = 



1 

ai 

N-M-l 



1 

aK 



ai 



( 1 

ai 



aK 



V 



1 \ 

aK 



(26) 



in terms of the linear (ak) and non-linear (ak) parameters of Eq. (j^J. If we 
could compute this decomposition directly, then the problem would be solved. 
Alas, no such algorithm is currently known. 

An indirect method exists to compute this decomposition cal led Hanke l 
SVD (HSVD). We will consider here a TLS variant called HTLS |VCDV94| . 
First, we note the shift invariancc property of S (and similarly for T) 



^= diag(Q!i,- • • ,aK) 



(27) 
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Next, we note that if such a decomposition is possible, then S, A and T are 

all of rank K by inspection, so is at least of rank K, as well. So, using 
SVD we construct the nearest rank K approximation to 

= (UaUs) ( o) (^^^'"^2)^ = UaSkV]^ (28) 
By comparing the decompositions of Eq. (|26|l and Eq. ()28|l we can see 

Span(S) = Span(UK) =^ Ua' = SQ =^ = UaxQ^^^Q (29) 



So, computing the TLS solution of 



will give us Q »4,Q, which we 



can then diagonalize using an eigenvalue solver to get our estimates of ak- 
in our experience with these black box methods, the HTLS algorithm 
seems to be the most robust. However, we would like to emphasize two points. 

First, the estimates of ak from HTLS are considered sub-optimal because 
HsA" in Eq- H28|) is only approximately, but not exactly^ a Hankel matrix be- 
cause the SVD does not enforce the Hankel structure throughout. A similar 



U' U 



A'-'^i 



Structured 



problem occurs while constructing the TLS solution of 

TLS algorithms (STLS ) exist which can construct H^a while preserving the 
Hankel structure (see |Van99l| for references) and hence restoring the opti- 
mality of the estimates. While we have not yet tried these STLS algorithms, 
we note that all of them involve iterative procedures to restore the structure. 
Thus, under the "no free lunch" theorem, we suspect that the price of restor- 
ing optimality is roughly equivalent to performing the (optimal) non-linear 
least squares minimizations described in Sec. |21 

Our second observation is that LQCD data is always correlated, so that 
a GTLS or RTLS algorithm is needed to compute the TLS solution of 

V]^Vki ■ But, covariance estimates of Ua are not readily computed from 
the data covariance matrix because of the required SVD. Thus, a jackknife 
or bootstrap resampling method is required to estimate cov(Ua). Since we 
expect to use a resampling method to estimate the covariance of the ak, this 
means that there is an inner and outer resampling loop so the method can eas- 
ily become computationally expensive if the number of data samples becomes 
large. In this case, blocking the data is recommended. 



4 Conclusions 

We have found that reviewing the literature of other fields where data analysis 
of exponentially damped time series is also prevalent to be quite fruitful. Our 
review has discovered several mature analysis methods which are virtually 
unknown (or unmentioned) in the Lattice QCD literature. We have performed 
several tests of all the methods discussed on fake data and on some actual 
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LQCD data are encouraged by the results. So, we are incorporating these 
techniques into our production versions of analysis programs and expect to 
report results soon. 

Finally, we would li ke to a cknowledge that we have found Leentje Van- 
hamme's Ph.D. Thesis Van99j an extremely useful guide to the literature of 
the NMR spectroscopy community. We would encourage anyone interested in 
learning more to start there. An electronic copy is currently available online. 

This work was supported in part by DOE contract DE-AC05-84ER40150 
under which the Southeastern Universities Research Association (SURA) op- 
erates the Thomas Jefferson National Accelerator Facility. 
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